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ABSTRACT 



A Two-Dimensional Fast Recursive Least Squares (2-D FRLS) algorithm is pre- 
sented using a geometrical formulation based on the mathematical concepts of vector 
space, orthogonal projection, and subspace decomposition. 

By appropriately ordering the 2-D data, the algorithm provides an exact least- 
squares solution to the deterministic Normal equations. The method is further extended 
to the general FIR Wiener filter and to ARMA modeling. The size and shape of the 
support region for both the MA and AR coefficients of the filter can be choosen 
arbitrarly. The ARMA parameter estimation problem is also considered for the case 
when the system input is not available. 

Computer simulations are presented to illustrate the applications of the algoritm for 
2-D parameter estimation, system identification and image coding. 




TABLE OF CONTENTS 



I. INTRODUCTION 1 

A. PROBLEM FORMULATION 1 

B. THESIS OVERVIEW 2 

II. ADAPTIVE FILTERS 3 

A. PERFORMANCE CRITERIA 3 

B. LEAST MEAN SQUARE (LMS) ALGORITHM 4 

1. 1-D LMS 4 

2. 2-D LMS 5 

C. I-D RECURSIVE LEAST SQUARES (RLS) 6 

III. 2-D FAST RECURSIVE LEAST SQUARES 11 

A. 2-1) PREDICTION FILTERS 11 

B. 2-D DATA ORDERING 13 

C. PREVIOUS DATA/ PAST OBSERVATIONS 14 

1. (M + l) Channel Analogy 14 

2. Past data 16 

D. PROBLEM REFORMULATION 16 

E. VECTOR SPACE CONSIDERATIONS 20 

1. Projection Matrix Time Update 22 

2. Angle Parameter 23 



IV 



F. AUXILIARY FILTERS 24 

1. (M + l) Channel Forward Prediction Filter 24 

2. (M + l) Channel Backward Prediction Filter 26 

3. Gain Transversal Filter 29 

G. FILTER UPDATE PROCEDURES 31 

1. Transversal Operator Update 31 

2. 2-D Filter Update 33 

3. Gain Filter Update 35 

4. Forward Filter Update 37 

5. Backward Filter Update 39 

6. Angle Parameter Update 41 

7. Inverse Matrix Update 42 

H. ALGORITHM SUMMARY 43 

1. Computational Complexity 43 

2. Initial Conditions 44 

3. Iteration at time n and Required Arithmetic Operations ... 44 

IV. EXTENSIONS, APPLICATIONS AND RESULTS 47 

A. GENERAL FIR WIENER FILTER 47 

B. ARMA MODEL 49 

C. ARMA MODELING WITH UNKNOWN INPUT 51 

D. SIMULATION RESULTS 52 

1. 2-D System Identification 53 

2. 2-D Parameter Estimation 57 

3. Image Coding 62 

V. CONCLUSIONS AND RECOMMENDATIONS 74 



v 



APPENDIX. PROJECTION MATRIX UPDATE 



LIST OF REFERENCES 



INITIAL DISTRIBUTION LIST 



LIST OF FIGURES 



1. First Quadrant (N + l)(M-fl) Filter 12 

2. Data Ordering 14 

3. M + l Channel Analogy 15 

4. Past Data 17 

5. Forward Filter Mask 25 

6. Backward Filter Mask 27 

7. Modeling with known input 51 

8. Modeling with unknown input 52 

9. System Identification MA Model 54 

l 

10. System Identification MA Model (2-D LN1S) 55 

11. System Identification ARMA Model (AR coelT. stationary data) ... 56 

12. System Identification ARMA Model (MA coelT. stationary data) ... 57 

13. System Identification ARMA Model (AR coelT. nonstationary data) 

A = 1.0 58 

14. System Identification ARMA Model (MA coeff. nonstalionary data) 

A = 1.0 59 

15. System Identification ARMA Model (AR coeff. nonstationary data) 

A = 0.95 60 

16. System Identification ARMA Model (MA coeff. nonstationary data) 

A = 0.95 61 

17. Parameter Estimation AR Model 62 



Vll 



IS. Parameter Estimation AR Model (2-D LMS) 63 

19. Parameter Estimation ARM A Model (AR coefT.) 64 

20. Parameter Estimation ARMA Model (MA coefT.) 65 

21. linage 1 Original 66 

22. Image 2 Original 66 

23. Predictive coding (taken from (Ref. 6] ) 67 

24. Image 1 Reconstructed 2-D LMS (AR) 67 

25. Image 1 Reconstructed 2-D ERLS (AR) 68 

26. Image 1 Reconstructed 2-D FRLS (ARMA) 68 

27. Image 1 Error 2-D LMS (AR) 69 

28. Image 1 Error 2-D FRLS (AR) 69 

29. Image 1 Error 2-D FRLS (ARMA) 70 

30. Image 2 Reconstructed 2-D LMS (AR) 70 

31. Image 2 Reconstructed 2-D FRLS (AR) 71 

32. Image 2 Reconstructed 2-D FRLS (ARMA) \ ^ 

33. Image 2 Error 2-D LMS (AR) 72 

34. Image 2 Error 2-D FRLS (AR) 72 

35. Image 2 Error 2-D FRLS (ARMA) 73 



viii 



ACKNOWLEDGMENT 



I wish to express my sincere appreciation to Professor Charles W. Therrien, my 
thesis advisor, for his assistance, patience and all the encouragement provided during 
the preparation of this thesis. I also want to thank Professor Murali Tummala, my 
second reader, for his availability and support. Last, but certainly not least, I wish 
to thank my wife Maria do Ceu who made our stay in Monterey wonderful. 



IX 



I. INTRODUCTION 



Adaptive algorithms have been used successfully for many years in a wide range 
of digital signal processing applications involving non-stationary data. In these appli- 
cations it is desired to follow closely the variations of the parameters characterizing 
the process, by updating (in real time if possible) estimates of these parameters as 
soon as new data is available. Real time implementation of these algorithms only 
recently became possible with the latest capabilities of the VLSI technology and is 
partly a result of the development of computationally affordable algorithms based on 
a very elegant mathematical formulation. This formulation is known as fast recursive 
least squares (FRLS) and is based on a geometric approach. The derivation of al- 
gorithms based on the geometric approach uses the concepts of linear vector spaces, 
orthogonality, projection matrices, and their relation with least squares prediction 
[Ref. 1. 2, 3]. Motivation for the development of similar algorithms to process two- 
dimensional (2-D) data is a consequence of the very interesting results reported lately 
in the literature on adaptive filters for one-dimensional ( 1 -D ) signals in what concerns 
their reduced complexity and excellent behavior even in non-stationary environments 
[Ref. 4, 5]. The development of Fast RLS algorithms for 2-D data based on the 
geometric approach is what is addressed in this thesis. 

A. PROBLEM FORMULATION 

A major problem with the extension of Fast RLS algorithms to two dimensions 
is that causality is not inherent in 2-D systems. This problem was overcome by 
associating the past of a 2-D signal with the region of support of a recursive filter mask 
(usually quarter plane or non-symmetrical half plane). By appropriately ordering the 
2-D data, a two-dimensional Fast Recursive Least Squares (2-D FRLS) algorithm 



1 



is developed using a geometrical formulation where the vector spaces and all the 
notions associated with them are defined to reflect the 2-D nature of the data. An 
exact least squares solution to the deterministic Normal equations is provided for 
all of the all-pole (AR), all-zero (MA), or pole- zero (ARMA) models. The size and 
shape of the support region for both the MA and AR coefficients of the filter can 
be chosen arbitrarily as long as the overall system is recursively computable. The 
ARMA parameter estimation problem is also addressed for the case when the system 
input is not known. 

B. THESIS OVERVIEW 

The remainder of this thesis is organized as follows. Chapter 2 provides a sum- 
mary of the most common adaptive filtering techniques. Only brief reference is made 
the Least Mean Square (LMS) algorithm due to its simplicity and slower convergence 
properties. However a 2-D version of this technique that was recently reported in the 
literature is mentioned. Most of the chapter reviews the basic ideas of the Recursive 
Least Squares (RLS) algorithm. This provides preparation for chapter 3 where a fast 
2-D version of this algorithm is developed in detail. 

Chapter 4 is dedicated to applications of the new algorithm to the 2-D problems 
of systems identification, image coding, and parameter estimation. Different models 
(AR, MA, and ARMA) are considered. 

Chapter 5 summarizes the results obtained and suggests some possible improve- 
ments for the new algorithm. Mathematical derivations related to the geometrical 
formulation that are essential to the method, but too tedious to be inserted in the 
body of the thesis, are grouped in the Appendix. 
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II. ADAPTIVE FILTERS 



A. PERFORMANCE CRITERIA 

In adaptive filtering the performance criterion is usually based on the minimiza- 
tion of a cost function dependent on the filter coefficients to be determined. The most 
common performance criterion is the minimization of the mean square error (MSE) 
associated with the signal to be estimated [Ref. 5, 6, 7]. In particular, if we consider 
a random process y(n) and a predictive filter of the form 

M 

y(r>) = h n (k)y(n - k) (1) 

i t=i 

where y(n) is the predicted value and h n (k ) are the filter coefficients, then the pre- 
diction error is defined by 



e(n) = y(n) - y(n) (2) 

and the MSE becomes 

c = E[e 7 (n)] (3) 

where E is the expectation operator. For stationary data, this quantity is a convex 
quadratic function of the filter coefficients h n (k) and attains its minimum at a point 
where the partial derivatives with respect to each of the filter coefficients are simul- 
taneously equal to zero. Substituting (1) and (2) in (3) and simplifying, we obtain 
the desired expression 

r\ 

dh j kj = -2E[e(n)y(n ~ k)] = 0 for k = 1 , M (4) 

The dependence of the performance criterion on the filter coefficients can be in- 
terpreted in terms of a multidimensional convex surface with a unique minimum. This 
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surface is called the error-performance surface. The coefficients associated with the 
minimum mean-square error are obtained by solving the set of simultaneous equations 
in (4) which in this case are known as the Normal equations. 

B. LEAST MEAN SQUARE (LMS) ALGORITHM 
1. 1-D LMS 

The Normal equations can be solved by brute force using matrix inversion 
or by computationally faster methods such as the Levinson-Durbin algorithm for 
Toeplitz matrices. However here, we are interested in a method called the steepest 
descent , which provides an iterative solution to the Normal equations [Ref. 5, 6, 7, 8]. 
We start with a initial set of filter coefficients and a corresponding point on the error 
performance surface. We then compute the gradient vector formed by the partial 
derivatives of the mean-squared error with respect to each of the filter coefficients at 
that point. Using (4) the gradient vector can be expressed as 

V(n) = -2£|e(n)y(n)] (5) 

where y(n) is a M x 1 vector that contains the data covered by the filter mask at 
time n 

y(r?) = [y(n - l),y{n -2 ),...,y(n - M)] T (6) 

Finally we update the coefficients by changing them in a direction opposite to that 
of the gradient vector using a predefined step size y 

— n+i = hn + = h n + yE[e(n)y(n)} (7) 

where h n is a M x 1 vector that contains the filter coefficients at time n 

h„ = [/.„(l),A„(2),...,A n (A/)] T (8) 

The inconvenience of this approach is that it requires an exact measure of the gradient 
vector at each iteration and the gradient involves statistical expectation. Usually the 
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statistics of the data are not available and must be estimated from the raw data. 
Thus to obtain a really accurate estimate of the gradient is quite cumbersome and 
computationally expensive. 

A practical method to estimate the gradient vector in a simple manner 
directly from acquired data is used in the Least Mean Squares (LMS) algorithm of 
Widrow [Ref. 5]. For this, the expectation in (5) is ignored and an instantaneous 
estimate of the gradient vector is taken to be 

V(n) = — 2e(n)y(n) (9) 

The update for the filter coefficients then has the form 

h n+1 = h n + ^/i[-V(n)] = h n + ne(n)y(n) (10) 

This method is quite attractive for a wide range of applications since it requires no 
matrix inversions, correlation function estimation, or (actual) gradient computation, 
and hence has low computational complexity. However its convergence is relatively 
slow. A detailed derivation and analysis of the LMS properties can be found in [Ref. 
5. 6]. 

2. 2-D LMS 

The extension of the LMS algorithm to 2-D signals is straightforward. 
Reference [Ref. 9] gives a detailed derivation and shows that the analysis presented by 
other authors for the 1- D LMS is also applicable to the 2-D version of the algorithm. 

The final form of the 2-D LMS algorithm is very similar to (10) but the 
vector containing the 1-D filter coefficients h n is substituted by a matrix W ; con- 
taining the 2-D filter coefficients. The instantaneous estimate of the 2-D gradient 
uses the data matrix X ; formed by the 2-D input samples covered by the 2-D filter 
mask at iteration j. For a N x M 2-D sequence, at sample y(n, m); if j is the linear 
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scanning index 



j = mM + n (11) 

then the algorithm takes the form: 

W J+ , = W, + ^(-V(n)] =W,+ /,e,X, (12) 

A separate derivation of this algorithm was also presented in [Ref. 10], together with 
some examples of its performance through computer simulation of a noise canceler 
and an adaptive line enhancer applied to an image processing problem. 

C. 1-D RECURSIVE LEAST SQUARES (RLS) 

In the adaptive methods presented above the need to solve the Normal equa- 
tions appears as a consequence of the minimization of a statistical cost function, 
the mean-squared error. However the implementation instead uses the actual data to 
compute errors and update the coefficients. This is the main cause of the performance 
deficiencies encountered when implementing this algorithm. 

Another possible approach is to base the performance criterion upon error mea- 
sures derived from the actual data. This class of techniques is known generally as 
Least Squares (LS) algorithms. 

The LS algorithm is designed to find the set of filter coefficients that minimize 
the cumulative sum of squared errors. 

e ( n ) = E e2 (0 ( 13 ) 

t=i 

Although this seems very similar to the previous performance criterion, it results in a 
set of deterministic Normal equations whose solution provides filter coefficients that 
are exactly optimal, according to (13), for the acquired data instead of statistically 
optimal for a class of data as in the case of the steepest descent methods [Ref. 6]. 
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To formulate this problem we once again differentiate the cost function with 
respect to the coefficients. However since we here use summations instead of expec- 



tations the result is 



^^r = -2r,(0,*) + 2X;M')r»(t.O = 0 for jfc = 1 A/ (14) 

Ou n yk ) 



where we define the deterministic correlation function r n (k,l) as 

n 

r n(M) = J2y( n ~ k)y( n - 0 f° r = o , ...,m 

»=1 



(15) 



This set of M simultaneous equations are the deterministic Normal equations. The 



(16) 



equations are written in matrix form 


as 






R(n 


)lln = E (n) 




where R(n) is the M x M deterministic correlation matrix wi 




r„(l.l) 


r n(T 2) ••• 




R(n) = 


r„(2,l) 


r„(2.2) ••• 


r n (2,M) 




r n (M, 1) 


r„(A/,2) ••• 


r n (M,M ) 



(IT) 



and r(n) is the M x 1 vector of deterministic cross-correlation terms between the 
desired filter response and the filter inputs. 



E(”) = [r n (0,l),r n (0,2),---,r n (0 t n/)] 3 



(18) 



If R(n) is nonsingular then the solution to the Normal Equations is formally 

hn = R _1 (n)r(n) (19) 

This brute force solution requires on the order of M 3 arithmetic operations. A better 
approach is to use a method known as the recursive least squares (RLS). This uses 
the Matrix Inversion Lemma [Ref. 5] to update the inverse correlation matrix and the 
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cross correlation vector as new data is acquired and thus to compute h n recursively. 
The resulting expression for R -1 (n) is 



„ a -1 (n — l)^(n)^ r (n)R~ 1 (n — 1) 

R ( ’’ ) = R ( ”- 1) - 1 + y T (n)R -1 (n — l)y(rc) 

By defining the a priori error e(n\n — 1) as 



( 20 ) 



e(n|n - 1) = y{n) - y T (n)h n _ 1 



(21) 



and the gain vector k (n) as 



k(??) = 



R x (n - l)y(n) 



1 + y T (n)R -1 (n - l)y(n) 



we can rewrite (20) as 



( 22 ) 



R-‘(n) = R-'(n - ]) - k(« jy T (r> )R -1 ('i - 1) 



(23) 



If we then use r(?}) = r(n — 1) 4- y{n)y_(n) and substituting in (19) the desired update 
for coefficient vector h n is found to be 



h n = h n _j +k(n)e(n|n- 1) (24) 

To update the coefficient vector as new data is acquired all we must do is to compute 
the last four equations assuming that all the parameters with index n — 1 are available 
at time tj. Since the non-singularity of the deterministic correlation matrix is a 
requirement for the solution of the problem, we must start with the initial condition 

R -1 (0) = c -1 Ia/ x m (25) 

where c is a small positive constant. It also customary to initialize all the components 
of the coefficient filter h n to zero. 

The RLS algorithm is computationally more expensive than the LMS. The RLS 
algorithm requires a total of 3il/(3 + M)/2 multiplications/divisions per iteration, 
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while the LMS algorithm requires only 2 M + 1. On the other hand the convergence 
performance of the RLS algorithm is much superior [Ref. 5, 6]. A detailed derivation 
of this algorithm and its overall performance can be found in many references such 
as [Ref. 5, 6, 8]. 

An enormous reduction of the computational complexity of the RLS method 
is obtained by using a geometrical formulation in its derivation. The computational 
complexity of the algorithm is reduced to approximately 6 M arithmetic operations. 
The geometrical approach also provides an interesting interpretation of the prediction 
problem in terms of the concepts of vector spaces and orthogonality. Since we will 
use an expanded version the geometrical formulation to derive the extension of this 
method to 2-D signals, and the derivation is lengthy, we will not derive the 1-D method 
here. However a very comprehensive explanation of the geometrical approach for 1-D 
signals can be found in [Ref. 6]. 

For the case of nonstationary data it is frequently advantageous to incorporate 
a forgetting factor in the cost function 

e(n) = \ n ~ l e 2 (i) for 0 < A < 1 (26) 

t=i 

The interpretation of this forgetting factor can be understood as an exponential win- 
dowing of the data in a fashion such that the most recent data has a heavier influence 
in the cost function to be minimized. The fast algorithm is mathematically equivalent 
to the RLS, hence its stability is guaranteed in theory for any possible forgetting factor 
A [Ref. 7]. However the efficiency of this class of algorithms is a result of the reduced 
number of variables used to represent quantities such as the inverse deterministic 
correlation matrix. Due to the finite precision arithmetic used, the representation is 
only approximate. As a result the accumulation of round-off errors can set off insta- 
bility of the algorithm. The sensitivity of some quantities to round-off error are highly 
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dependent on the forgetting factor used. This imposes a lower bound on A. Typical 
values for A are in the range: 



0.95 < A < 1.0 



(27) 
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III. 2-D FAST RECURSIVE LEAST SQUARES 



In this chapter we present the derivation of the 2-D FRLS algorithm. As men- 
tioned before we use the geometrical formulation to obtain a fast, computationally 
efficient algorithm. 

We start by introducing a convenient notation that closely follows the one used 
by Alexander [Ref. 6] for the geometric derivation of the 1-D FRLS. This is followed 
by a brief set of vector space considerations that are the basis of the problem solution. 
Next some auxiliary filters that use the same data as the 2-D filter are introduced. 
The key to this method is to find a relation between the parameters of these filters 
that permits the recursive update of all of them as soon as new data is available. 

A. 2-D PREDICTION FILTERS 

The method to be described applies to a general 2-D prediction filter of the form 

y{n i.n 2 ) = -j) ( 2s ) 

* J 

with (z,j) defined in a region that allows the 2-D AR model related to the prediction 
error process to be recursively computable (ex: quadrant or non-symmetric half plane 
support). The recursive computability is a requirement for applications where inverse 
filtering is used to recover the 2-D data sequence from the estimated error sequence 
as in most of the image coding schemes. To be specific and develop clear notation we 
will assume a first quadrant (A r + 1) x (M + 1) filter of the form 

N M 

= ^ (0,0) (29) 

1=0 j = 0 

and a 2-D data sequence with I\ x L points as shown in Figure 1. 
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l-l 




Figure 1. First Quadrant (N+1)(M+1) Filter 

The prediction error e(rii,n 2 ) = y{n\,n 2 ) — y(n\,n 2 ) can be expressed using 
vector notation as 



e{n), n 2 ) = !/(n,,n 2 ) - y T (n,,n 2 )a 



(30) 



where 



y(ni,n 2 ) = [y(m - 1, n 2 ), ••• ,y(n } - N, n 2 ), y(n u n 2 - 1), •• • 

• • -,2/(71! - N,n 2 - 1),- • • ,2/(77! - A^, 77 2 - 2), • • • 

— — A/), — ,y(ni - A', 77 2 - M)] r (31) 

is a (N + 1)(A/ + 1) — 1 dimensional vector formed by the elements covered by the 
filter mask, ordered along rows, and 

a = [aio, • • • , flA'o, aoi , • • • , a/vi , • • • , aoA/ 5 ‘ • * , «na/] T (32) 

is the vector of 2-D filter coefficients with the same ordering. We assume for now that 
y(k,l) = 0 for k < 0 or / < 0. 
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W hen performing linear prediction along one of the possible directions of recur- 
sion (e.g., along rows), and all of the data necessary for each prediction is available, 
the optimal filter (least squares criterion) up to point (n 1 .r? 2 ), will be defined as the 
one having a set of coefficients a,j(77 1 ,n 2 ), ( 0 < i < N , 0 < j < M , (i,j) / (0,0) ) 
that minimizes the sum of the squared errors along that recursion direction. 

c(wi, tj 2 ) = £[e(?,n 2 )] 2 + 52 5Z[e(*>i)] 2 ( 33 ) 

t =0 1=0 j —0 

B. 2-D DATA ORDERING 

One question that arises whenever we deal with finite extent sequences is what 
to do when we approach the boundaries of the 2- D data sequence and the prediction 
mask needs to cover points that lie outside of the region where the data is defined. 
One approach is to set the points outside of this region (i.e., the boundary conditions) 
to zero. The inconvenience of this approach is that the boundary conditions depend 
not only on the extent of the 2-D sequence but also on the shape of the filter mask 
and this can lead to additional complications [Ref. 11]. In addition, when we reach 
the end of a row and we start a new one, the data under the mask is almost all reset 
to zero. This causes a strong discontinuity in the process. 

An alternative approach is to assume that, although the 2-D data we process 
may not be stationary, the statistical properties of the data do not vary too rapidly, 
and so to use the data at the end of one row as the initial condition for prediction 

along the next row as is shown in Figure 2. This appears to be at least as reasonable as 

the first approach. It will be seen later that this approach also has several advantages 
in deriving an algorithm based on the geometrical approach. From a practical point 
of view, it is as if we fold the 2-D data plane and form a cylinder with perimeter 
equal to A', but the data rows instead of folding into themselves, are misaligned by 
one row. This allows the prediction to be performed along rows with the 2-D mask 
moving from bottom to top in a helical fashion. 
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Figure 2. Data Ordering \ 

C. PREVIOUS DATA/ PAST OBSERVATIONS 

Performing linear prediction on a 2-D signal along rows implies that the new 
data comes only from a strip with the width of the filter mask (A/ + 1). This suggests 
an analogy with the (A/ -f 1) channel 1-D prediction problem. 

1. (M+l) Channel Analogy 

The (A/ + 1) rows of the data strip can be viewed as (M + 1) channels of 
a 1-D signal. To support this idea, define a linear index n such that 

n = 7 ? 2 x K + rij (34) 
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Then the 2-D data sequence ,v(n 1 ,?? 2 ) can be expressed in terms of the data in the 
first channel as 

y(n,,n 2 ) = yi(n) (35) 

We can also express the data in the other channels in terms of t/i(n) as 

j/,(n) = t/j(rz — (i — 1)A') for t = + 1 (36) 

We will predict along the first channel using data from all channels defined by the 

2-D filter mask. A consequence of this approach is that the length of data used from 
each channel depends on the shape of the 2-D mask. Figure 3 shows the particular 
case of a Quarter Plane mask. In order to predict t/j(n) = y(??i,?? 2 ), the data used 
is formed by N samples of channel 1 (y Q and (N + l) samples of channels 2 (t/ 2 ) to 

M + 1 (t/a/+i )- This requires a total of (A r + 1) x ( M + 1) — 1 coefficients as in the 

2-D mask. 




Channel 

Channel 

Channel 



Channel 



1 

2 

3 

M+l 



Figure 3. M+l Channel Analogy 
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2. Past data 



The multichannel analogy will be used to define what we call the past data 
and the observation data because the notion of multichannel prediction will be useful 
later. From now on we will drop the 2-D notation and use the index n associated 
with the recursion along the rows. 

Begin by defining the (n + l)-dimensional vector 

y,(rc) = Lv.(0),j/,(1), ,3/.(n)] T 1 <i < M + 1 (37) 

that contains data in channel i up to the point n. Further define the delay operator 
z~ k such that 



2 *£,•(”) = [0.0, • - - , 2 /, (0),t/,(l ),•••, yi(n-k)] T (38) 

is a (t? -f- l)-dimensional vector that contains the data of y { (n) delayed by k samples 
and pre-windowed. We call y (r?) the observation data since it contains the data 
sequence we desire to predict. The past data w r ith respect to y\(n) is formed by all 
of the points y,(j) such that (2 < i < M + 1, 0 < j < n) or (i = 1,0 < j < n — 1) 
as shown in Figure 4. Note that different channels have data in common. With the 
new notation defined we are ready to reformulate the problem. 

D. PROBLEM REFORMULATION 

We start by redefining (31), the vector that contains the data covered by the 
2-D filter mask, as 

y hN (n) = [yi(n - 1),- ■ ■ ,yi(n - N), j/ 2 (n), • • • , yw+i(n), • • • , yM+i(n - A r )] (39) 

where the subscript (1,N) denotes the fact that the data in the first channel appears 
delayed by 1 to N samples. We want to find a prediction filter of the form 

yi(”) = yf iN (n)a(n) (40) 
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Figure 4. Past Data 



with 



a('i) - [«io('0' ' ' • ,«no(n)> a 0 i, • • • • • ■ ,a 0M (n), ■ ■ -,a NM (n)] T (41) 



that minimizes the sum of squared errors 

f i( 7 0 = X>i(01 2 

1=0 

where the prediction error given by 

e iH = J/i(n) - yi(n) 
This can be written in vector notation as 



(42) 



(43) 



£ l(") = ej‘(n)g,(n) 



(44) 



where 



£i(») = y,( n )-y,(«) 



(45) 
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and 



y x ( n ) = Y ltA '(n)a(n) (46) 

where Y lA ’(7?) is a data matrix formed by the data covered by the 2-D mask from 
the origin up to time n. The matrix Yi^n) can be written as 

yy°) 
yyi) 



Y lf .v(n) = 



(47) 



yf>) J 

or using a different partition. Y liA -(7?) can alternatively be written in terms of the 
data in the M -4- 1 channels and their delayed versions (37) and (38) as 

y 1iA ’( 7?) = U -1 Xj(n ),••■, s -A y 2 ( n )’ z_1 y 2 ( n )’ ' ' ' ' ' 



X A/+ »] (48) 



As will be shown later, a necessary initial condition for the geometric formulation 
to work is that we start at a sample t/i(0) such that v (0) = 0. That is, the 
initial conditions for the data under the 2-D mask must be zero. Now let us proceed 
with the minimization of (42). The least squares solution for a(n) is given by the 
pseudo-inverse: 

a(») = (Yf. lv .(n)Y,, w ( n ))' 1 Y?j,(n)j- 1 (n) (49) 

where (y£ a -(»)Yj, w (j))) can be interpreted as the inverse of a 2-D deterministic 
correlation matrix and Yf [^(n)y i (n) can be interpreted as the vector of the deter- 
ministic cross-correlations between the observation data and the past data. A new 
expression for e^T?) is obtained substituting the solution for a(n) in (46) and using 
the result in (45). This yields 



£i(«) = £i( n ) - Pi.ArMy^n) 

= (I — P i,Ar(n)) y,( n ) (50) 
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where 



Pi.K(n) = Y 1> -( n )(Y^.(")Y,.K(n))" , Y?; A .(n) (51) 

is interpreted as the projection matrix that projects vectors into the subspace spanned 
by the columns of Y Jr ;v(n). (Note here that it is assumed that 
exists.) Also define 

Pfjv(n) = I-P,,w(n) (52) 

as the orthogonal projection matrix associated with the same subspace. 

Both the L.S. estimate of ^(rc) and the prediction error can now be expressed 
in terms of the projections matrices. The estimate y^n) is the projection of y (n) 
onto the subspace spanned by the previous data 

£i(") = Pi,N(rc)yj(”) (53) 

The error £](??) is orthogonal to the estimate y ; (??). 

£i(”) = p i 1 ,N( n )yi( n ) ( 54 ) 

Next, we define the operator K \,s T (n) as 

K,.„(n) = (ylMYMn))-' Yf^n) (55) 

hence 

a(n) = Ki, iV (n)y l (n) (56) 

Ki ^’(??) can be interpreted as the operator that computes the best LS filter a(??) for 
predicting y^n) given the data set Yi^n). Now since y^n) can be obtained from 
y^ (n — 1) as soon as ?/i(n) is available, if we find an efficient way to get Kj^n) from 
Ki,A'(n — 1) then we will be able to update a(n — 1) to a(n). 
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E. VECTOR SPACE CONSIDERATIONS 



Before going any further we must present some concepts and relations associated 
with vector spaces that will be needed later on. To make the results generic, in this 
section our data matrix is called U(n). This generic data matrix can represent the 
matrix Y lt /v( 7? ) defined in (47) and (48) and other similar matrices that are defined 
later in this chapter. Then following the definition of (51) and (52), the projection 
matrix associated with U(??) is 

Pu (n) = U(n) (u T (n)U(n))" 1 U T (n) (57) 

and the orthogonal projection matrix is 

Pu(») = I-Pu (n) (58) 

The columns of U(n) are formed by the vectors that span the vector space associated 
with Pu(n); hence when we compute Pu(n)U(n), we have 

Pu(n)U(n) = U(n)(U r (r?)U(n)) _1 U T (n)U(n) = U(n) (59) 

v v 

I 

It follows from this that the projection matrix is idempotent 

Pu(")Pu(") = Pu( 7 OU(n)(U T (n)U(n))“ 1 U T (n) = P u (n) (60) 

- ' v ' 

U(n) 

The following relations also follow from the definition of Pu( n ) and Pu( n )- 

Put") = Pul 77 ) (61) 

Pu(n)Pij(") = PuOO ( 62 ) 

Pu T (») = Pul 77 ) (63) 

Pu( T7 )Pu( 77 ) = 0( n +l)x(n+l) (64) 

All of these relations are easy to prove. 
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Now let us append a matrix V (i.e., more columns) to the matrix U(n). The 
columns of V do not need to be orthogonal to the subspace spanned by U(n). hence 
the subspace spanned by [U(??),V] is the same as the one spanned by [U(n),W], 
with 

W = Pu( n )V (65) 

Then we define 

Puv(n) = Pu(n) + Pw(”) (66) 

= Pu(n) + Pu( n )V ([Pu( n )V] T Pu( n )V) _1 V T Pu(n) 

If we note that P|jy(n) = I — Puv( n ); then it follows that 

Ptv(n) = Ptr(.i)-Pi(n)V([Pi 5 (n)V] T Pi(n)V)' 1 V T Pi(n) (67) 

These results also are valid when V is a vector (i.e., a matrix with a single column). 

Some other useful relations with generalized vectors or matrices y and z which 
follow immediately from (66) and (67), are 



Puv(n)y = Pu( IJ )y + Pu( (, )V ([Pu(n)V] T Pu(n)V) _1 V r Pu(n)y (68) 

PuvWy = Pu(n)y-Pu(n)V([P6( n )V] T Pi(r,)v)' l V’'Pi)(n)y (69) 

z T Puv(n)y = i I P»(n)y (70) 

+ z T P{j(n)V ([Pi(n)V] T Pij(n)V)'' V T Pt(„)y 

zTp uv(”)y = z T Pu( n )y ( 71 ) 

- z T Pt(n)V {[Pu(n)Vj T Py(n)v) _1 V T P6(„)y 

These relations with the appropriate choices of U(n),V,y,z will be the basis of 
several recursions needed later. 
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1. Projection Matrix Time Update 

Let us partition the data matrix U(n) as 



U(n) 



U(n - 1) 



( 72 ) 



where u T is the last row of U(n). We can see from (47) that this is a valid repre- 
sentation when U(n) is taken to be Yj ,A, r (n). In this case u T corresponds to y^ v (n). 
It is seen that U(n) is formed by the columns of U(n — 1) with one more dimension 
appended, the components of u T . Now define a (n + l)-dimensional vector 7r(n) called 
the unit time vector [Ref. 6] 



2L(») = [0,0,0. • • ■ ,0,0, 0, l] r 



(73) 



and append it to the data matrix U(n) to form [U(n),7r(n)]. It will now be shown that 
the subspace spanned by this new matrix contains not only y^n) but also y^n — 1). 

To see this, proceed as follows. We know that if y^n) lies in the subspace 
spanned by U(n). then appending 7r(n) will not change a thing. Using (68) with 
V = 7 r(??) and y = y,(n) we obtain 



P Ur(n)y i (") = p u(”)y,( n ) ( 74 ) 

' 

+ p u(n)£(n) ([Pu( ri M n )] Tp u( n M n )) * IL r (») Pu(^)y ] (^) 

0 

To see that y t (n — 1) also lies in this subspace, note that since 7r(«) has only its 
last component non-zero, a linear combination of the vectors in [U(n), 2 r(n)j can be 
used to obtain a matrix whose columns span the same subspace as the columns of 
U(n - 1). 

It can be shown that p u r (n) has the particular form 



Pu £ (n) 



p u(n — 1) 0 

0 T 1 



(75) 
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(a detailed derivation of this result is presented in the Appendix). Further recall that 
y^n) is a vector formed by all the data from the origin up to point n. Thus (75) 
provides a way to decompose a projection matrix into past and present components. 



p u £ Nyj(») = 



+ 



Pu(n-l) Q 




- i) 


1 

0 

01 




yi(”) 



O Q 




- 1) 




- 1) 


0 T 1 




V\{n) 




y i{n) 



2. Angle Parameter 



(76) 



The vector 77 ( 77 ) also provides a way to quantify the change of subspaces 
when we update U(r? — 1) to U(??). First note that the inner product of two vectors 
gives the cosine of the angle between them multiplied by the product of their lengths, 
and also that the length of a vector resulting from the projection of any vector into 
a subspace is given by the length of the original vector multiplied by the cosine of 
the angle between the vector and the subspace (this angle has always magnitude 
< |). Now observe that Z[n) is a unit vector orthogonal to the subspace spanned by 
U(n — 1), and Pu(77)7r(n) is a vector orthogonal to the subspace spanned by U(n) 
with length equal to the cosine of the angle between 77(77) and this subspace. Then 
defining the inner-product as 



(a . b) = o t 6 



(77) 



we obtain 

7(77) = P\j{n)z{n)) = 1 X cos 6 x cos 0 = cos 2 0 (78) 

where 6 is the angle between the components of 77 ( 77 ) that are perpendicular to both 
subspaces { U ( 77 ) } and {U(n — 1 )}. The variable 7(77) will be used to update Ki,a’(u). 

In order to begin the derivation of the recursive procedure we now introduce 
three auxiliary filters: 
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- Forward Multichannel Prediction Filter 

- Backward Multichannel Prediction Filter 

- Gain Transversal Filter 
These are discussed next. 



F. AUXILIARY FILTERS 

The reason for the auxiliary filters will become clear later when it is shown that 
the update of a (n — 1) can be expressed in terms of the auxiliary filters parameters. 

1. (M+l) Channel Forward Prediction Filter 

We begin by defining a (M+l)-channel signal x F (n) formed by the data 
acquired by the 2-D mask when it is moved from n to n -f 1: 

Xp(») = [.Vi(n).J/2(n + !)•'■■> J/A /+1 { n + 1)] T (79) 

We want to find the best LS filter that predicts x F (n) based on the data y^ A ,(n) 
covered by the 2-D mask (see Figure 5). Let the coefficients of this filter be defined 
by a {(N + 1 )( A/ + 1) — 1) x (M + 1) matrix of the form 

E(n) = [£,(»).«")■• •■,£«+,(»)] (80) 

where each of the (( N -f 1 )(M -f 1) — 1 )-dimensional vectors £,(n) for (1 < i < M -f 1) 
is comprised of the multichannel prediction coefficients for channel i with the same 
support as a(??). The prediction of X/r(n) is given by 

x F (n) = F r (n)y J A ,(n) (SI) 

and the prediction error is 

e F (??) = x F (n) - x F (n) (82) 



If we define 



X F (n) 



y,(^),y 2 (^ + i)»---,y M+1 (n + 1)] 

[x F (0). x F (l), • • • ,x F (n)] T 



(S3) 
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Figure 5. Forward Filter Mask 

then Xf(n) is a (tj + 1) x (M + 1) matrix that contains all the multichannel data 
from the origin up to current value of the index n. 1 he estimate of X F (n) is thus 



X f ■(..) = Y,,„(n)F(.i) (84) 

and the prediction error, also a (n -f 1) x ( M +1) matrix, is 

E F (n) = X F (n)-X F (n) (85) 

The error covariance for the multichannel Forward Filter is: 

Ef(«) = Ef(n)E F (n) (86) 

Since we desire F(??) to minimize the error energy 

<r[E F (n)] = ir[E^(n)E F (n)j (87) 

the optimal LS filter is again obtained using the pseudo-inverse of Yj t ^(n) 

F(n) = (Yf -N (,.)Y, iW (n))"'Y?; N ( n )X F (n) (88) 
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This can be expressed using the operator defined in (55) as 



F (n) = K hN (n)X F (n) (89) 

The orthogonality principles mentioned before also apply. That is, the estimate X F (n) 
is the projection of the columns of X F (r?) onto the subspace spanned by the columns 
of the matrix containing the previous multichannel data (which in this case is the 
same as the 2-D data). 

X F (n) = P lt tf(n)X F (n) (90) 

The error E F (r?) is orthogonal to the estimate E F (n), i.e., the columns of these 
matrices span subspaces that are orthogonal to each other. 

EH") = PfjvMXH") (91) 

If e F (?7) is defined as the last row of E F (?7), it can be obtained using 77 ( 77 ) as 

e r( n ) = IL T (n)E F (?j) = 7r(«) T P^(n)X F (n) (92) 

2. (M+l) Channel Backward Prediction Filter 

For the backward prediction problem we define a (M + l)-channel signal 
x F (n) formed by the data left out by the 2-D mask when it is moved from time n to 
7r + l. This is given by (see Figure 6) 

Xfl(n) = [t/i ( 77 - N),y 2 {n - N),-- ■ ,VM+i(ri - A r )] T (93) 



Now define a(n + l)x((A’ + l)(J\/ + l) — 1) data matrix Y 0 ,k-\ (n) that has a structure 
similar to Y^/v^?) (47,48) 



XoV^ 1 ) 



Yo,A-i(n) = 



(94) 
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or 



Y 0 ,yv_i(«) = [y ,(»),■■•.* A ' +, y 1 ( n )^y 2 ( n ).y 2 (")r 



A+1 y.A^+i ( n )] ( 95 ) 



We note that 



il .«(") = &.N-l(" _1 ) 



(96) 



and 



Y 1 ,yv(n) = z- , Y 0l w-i(n) (97) 

The problem is now to find the best LS filter that predicts x fi (n) based on the data 
matrix Y 0 ,/v_i(n). Let the coefficients of this filter be defined by the ((N + 1)(A/ + 
1) — 1) x (A/ + 1) matrix 

B(n) = [b ,(n),b 2 (n),--.,b M+1 (n)] (98) 

where each of the ((A r -fi 1 )( A/ + 1)- l)-dimensional vectors b,(n) for (1 < i < A/ + 1) 
is comprised of the backward prediction coefficients for channel i. I he support of 
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this filter is the same as the support of a(n) but shifted one sample to the right (see 
Figure 6). 

The prediction of xg(n) is given by 

x B (n) = B(n) T y 0 ^_ 1 (n) (99) 

and the prediction error is 



e B {n) = x B (n) - x B (n) (100) 

We continue to proceed as we did for the forward multichannel filter. The (n -f 1) x 
(M + 1) matrix that contains the backward multichannel data from the origin up to 
n is defined as 



X„(n) = [.--'y 1 (.>).-- A ’y 2 (n),---,.-- A 'y„ + ,(»)] 
= [xb(0),x b ( 1), • • • ,x s (n)] T 



( 101 ) 



Then the estimate of X B (n) is 

X B (n) = Y 0v N_i(n)B(n) (102) 



and the prediction error (also a (n + 1) x ( M + 1) matrix) is 



E B (n) = X B (n)-X B (n) 



(103) 



The error covariance matrix for the multichannel backward filter is then given by 

E B (n) = E T B (n)E B (n) = X B (n) T P^_ 1 (n)X B (n) (104) 

To minimize the error energy 

tr [S s (n)j = tr [E B (n) T E B (n)] (105) 

our optimal LS filter is, once more, obtained using a pseudo-inverse, but this time for 
the data matrix Y 0 ,/v-i(rc)- 

B(n) = (Y 0 T A .. 1 (")Y„. W -i(n))' , Yj A ,. 1 ( n )XB(n) (106) 
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Finally defining a new operator K 0 ,A'-i(ft) in terms of the new data matrix Yo,A’-i(n) 

Ko.A--,(") = (Y 0 V,(n)Yo.N-,(n))'' Y 0 T A _,(n) (107) 

we can rewrite (106) as 

B(n) = K 0 ,jv_i(n)Xfi(n) (108) 

The orthogonality principles apply once more. The estimate X B (n) is 
the projection of the columns of Xg(n) onto the subspace spanned by the previous 
backward multichannel data 

Xfi(n) = P 0 ,A , -i('i)Xg(ri) (109) 

where Po,A'-i(ft) is the projection matrix associated with the vector space spanned 
by the columns of the new data set Y 0i A'_i(n). 

Po.N-i(>>) = Y 0 „v_,(») (Yj A .. 1 (n)Y 0Jl .. 1 (n))" 1 Yj A .,(n) (110) 

The error E^(/?) is orthogonal to the estimate X^(n), i.e., the columns of these two 
matrices span subspaces that are orthogonal to each other. 

Eb(») = PoVi(") x b(») (111) 

Defining e^(n) to be the last row of E B (n) we have 

e|(n) = ZL(rc) T E B (n) = 2L(n) T Po,A'- 1 (n)X B (n) (112) 

3. Gain Transversal Filter 

The gain transversal filter does not relate to specific prediction operations 
for the data but rather provides another way of quantifying the angular change y(n) 
between the subspaces associated with data matrices at times n and n — 1. To begin, 
consider the projection of the vector 7 r(n) onto the subspace spanned by Yo,a’-i(”)- 



29 



Since this projection Po,A r -i(n) 2 L(n) is contained in the subspace of Y 0 ,/v_i(n), we 
can express it as a linear combination of the columns of Y 0> /v_i(n). We write this as 

Po,A'-i( n k( ? 0 = Y o,A’-i(n)g(n) (113) 

where g(n) is a ((N +l)(M + \) — 1 )-dimensional vector of weights. Note that (113) can 
be interpreted as the LS prediction of 7r(n) based on the data matrix Y 0: A r -i(n) where 
g(??) is the ((A’ + 1 )( A/ + 1) — 1 )-dimensional vector of filter prediction coefficients. 



The estimate of 7 r(? 7 ) is thus given by 

i(n) = Po,N-i(n)zr(n) = Y 0 ,A 7 _i(n)g(n) (114) 

and the prediction error is 

e-(n) = tt(u) - Po,A'-i (n)l(n) = Po,A’-i( n M n ) ( 115 ) 

The last component of e_(?? ) can be obtained using 2 .( 77 ) and turns out to be equal 
to 7 ( 7 ?) as in (78) 

tr(n) = £(n) T Po,A'-i( n )£( n ) = 7( n ) = cos 2 & (116) 

Then substituting the middle part of (115) in (116) we obtain 

7 (n) = (a(w),2 r(n) - P 0 jv-i(n)7r(n)) = 1 - ( 2 ( 77 ), P 0 ,A’-i(n)7r(n)) (117) 

and using (113) in (117) we find 

7 ( 77 ) = 1 - ( 21 ( 77 ), Y 0 ,A-i(«)g(n)) = 1 -yJ A r_!(n)g( 7 ?) = cos 2 6 (118) 

w’here y^ v-1 (n) is the last row of the data matrix Y 0)J v-i(n). If w r e now recognize 
that cos 2 0 = 1 — sin 2 0 we see that 

>J A _i( n )g( n ) = sin20 ( 119 ) 

If we use the LS criteria to get g(?7), the solution is again, given in terms of a pseudo- 
inverse, or more conveniently in terms of the operator K 0i A T -i( n ) of (107) 

g(n) = K 0i a’— 1 ( 71 ) 21 ( 77 ) (120) 

This is in the same form that we have for the other filters. 
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G. FILTER UPDATE PROCEDURES 



In this section we determine the time update for the four filters defined so far. 
We begin by showing how to update Kj ^(n) and K 0) /v_i(n). This result will be used 
for updating each of the four filters. 

1. Transversal Operator Update 

Since the operations to be described now are common to all four filters, 
we develop the formulas in this subsection in terms of the generic matrices U and V 
introduced in section E of this chapter. Beginning with (66) we have 

Puv(") = Pu(«) + Pu(")V ([Pu(")V] T P6(")v) _1 V T P^(n) (121) 

where U is a (n + 1) x ((A r -f 1)( A/ -4- 1) — 1) matrix and V is a (n -4- 1) x (M 4- 1) 
matrix. By definition 

PuvH = [U.V] (|U,V] 7 ‘|U,V])‘ 1 |U,V] T (122) 

and 

Kuv(n) = ([U,V] r [U,V]) _1 [U,V] r (123) 

It follows that 



Kuv(«) = Kuv(«)Puv(”) 



(124) 



hence from (123) 



Kuv( n )[U,V] = I[(a t +1 )(A/+1 x [(A r +l )(A/+i (125) 

I N'xN' On'xA/' 

_ Om'xN' Ia/'xA/' 

with N 1 = ((N + 1 )( A/ + 1) — 1) and M' = (M + 1). Now we can partition (125) and 
write it as two equations, namely 



K UV (n)U 



Fv'xA' 

Om'xN' 



(126) 
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and 



Ku v(")V = 



Oa''xA/' 

Ia/'xA/' 

The first equation can be used in conjunction with Py = U(ri)Ku(n) to obtain 



(127) 



Kuv(n)Pu(«) = 



Ku(n) = 



Ku(n) 

OA/'x(n+l) 



(128) 



I N'xN 1 
Om'xN' 

Now substituting (121) in (124) and using (125)-(128) to simplify the resulting ex- 
pression we have 



Kuv(n) = 



+ < 



Oa"xA/' 

Ia/'xA/' 

, -1 



Ku(n) 

Oa/'x(»i+I) 



V 



► (129) 



Ku(n) 

_ 0 ; \/' X (n+l) 

X ([P6(n)V)^Pi(n)v)- , V’'Pt( n ) 

By forming [V, U] and following similar procedures, it is straightforward to show that 



Kvu(n) = 



Oa/'x(71+1) 


+ < 


f 


Ia/'xA/' 




Oa/'x(ti+1) 


\ 

V 


1 

* 
a ^ 

1 




< 


Oa'xA/' 




Ku(n) 


> 



(130) 



Ku *(n) = 



(131) 



x ([Pt,(n)V] T Pfc(n)v)~ V T Py(n) 

For the particular case when V = 7r(n) these relations are also valid but we can obtain 
them from the derivation given in the Appendix. The result is 

Ku(n — 1) Q 
-u^Kufn — 1) 1 
To check this, note that for V = 2 .( 71 ) (124) can be w T ritten as 

Kur(n) = Ku £ (n)P Ur( n ) (132) 

Then equating these two ways of obtaining Ku, £ (rc) we can update Ku(n — 1) to 
Ku(n) 

- 1 1 nl 1 Vf'n'l 

(133) 



, 

* 

3 

1 

t— ‘ 

IO 

1 




’ Ku(n) ‘ 




Ku(n)i(n) 


_ -U T Ku(77 - 1) 1 




0 




-1 



X ([P^(n)2(n)] T P{j(n)7r(n)) 1 2 T (n)P|j(n) 
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2. 2-D Filter Update 

We now develop update formulas for the 2-D filter. From (56) we have 



a(n) = K 1 , A r(n)y i (n) 



(134) 



To find a(n) in terms of a(n — 1) we use (134). Post-multiplying by y^n), and taking 
V = 7 i(n) and U = Y 1>A r(n) yields after simplification 



K, tN (n — 1) 0 




y> - ] ) 




’ K hN (n) ' 


. -y[, A ’ K i.^( n _ 0 1 . 




yiW 




0 



yj( n ) 



(135) 



K hN {n)n(n) 

-1 



£ r (")Pj;.v(n)Z 1 (") 

2L T (n)P^, A r(n)7r(n) 



At this point we note from (97) that since 

Y 1 ,.v(n) = c- 1 Y 0 ,A'-i(n) 
with yj v (0) = 0 as defined initially, then 



(136) 



Yi„vM = 



0' 



Y 0 ,A'-i(n - 1) 

From this it follows, using the respective definitions, that 



(137) 



Ki ,s{n) = [Q. K 0 , A -_i(n — 1)] 



(138) 



and 



Pi,A-(n) = 



0 0 r 

Q Po,A'-l(n-l) 

These results in conjunction with (118) and (120) allow us to write 



(139) 



g(n - 1) = K 0 „v-i (n - l)ir(n - 1) = Ki,A'(n)7r(n) 



(140) 
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and 



7(7? -1) = 1 - (r(77 - l),Y 0 pv-i(n - l)g(77 - 1)) = 1 -£^^(77 - l)g(r? - 1) 





= 1 - (7r(77),Y liA r(7?)g(77 - 1)) = 1 -^(n)g(fl - l) 


(141) 


and also 


7(77 - 1) = 2T T (77)Pi L >N (n)7r(77) 


(142) 


The upper part of (135) then becomes 






2(77 l)- 2 (n) s(n !) , ^ b 

7(77 - 1) 


(143) 


or 


2(7?) = a(v 1) + g(77 1) ^ . 

7(77-1) 


(144) 


To get d( 


7?) before updating a(n — 1) we substitute (144) in 






M”) = yi(n) -y J’ tJV (n)2(n) 


(145) 


to obtain 


e i ( 77 ) = ei(7i|n - 1) - yf >7V (n)g(n - 1) ^ 


(146) 


where we 


define 





e i( n l n - 1) = yi(n) ~li <N ( n )&( n - !) ( 147 ) 

Now (146) can be simplified using (141) 

e,(7?) = ci(n|n - 1) + (7 {n - 1) - 1) = e,(n|r? - 1)7(77 - 1) (148) 

7(n - 1) 

and finally (144) can be written as 

a(n) = a(n — 1) + g (77 — 1 )ej (n |r? — 1) (149) 

At this moment we have all we need to update a(n — 1) to 2(77) assuming that all 
variables with index 7? — 1 are available. However we want to find g( 7 ?) and 7(0) in 
order to proceed to update a(7?) to 2(77 + 1) as soon as Y] ^(t 7 + 1) is available. This 
is discussed next. 
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3. Gain Filter Update 



Begin by forming a new data matrix Y 0 ,A r ( 7? ) that can be expressed in terms 
ofY 1JV (n) or Y 0> ,v_i(n) and the other matrices Xf(n) or Xg(n) after a multiplication 
by suitable permutation matrices. Each row and column of the permutation matrix 
contains a single 1 with all the other entries equal to zero. The positions of the l’s are 
chosen so that when this matrix premultiplies one of the data matrices it rearranges 
its columns to conform with the desired order. We write this as 



Y 0 ,A-(n) = [X F (n), Y lt *(n)] tf F = [Y 0 ,a'-i ( n), X B (n)} 



(150) 



Now form K 0 ,a'( 7; ) and post-multiply the result by 7r(n) using the first part of (150) 
and (130) with V = X F (?j) and U = Y i,a t ( 7 0 to obtain 



K 0 jv(n)z(n) = *1 



O.U'x(n-fl) 


z(n) + tyjr 


Ia/'xA/' 


K liA '(n) 


— K lj A T ( n )X F (n) 






-F( n ) 



(151) 






X 



\ 



-1 



Xpn)P,V(")XF(n) 






X F(”) p i>( n )2t(n) 



Ef(«) / er(n) 

Since the permutation matrices are orthogonal (ty T = ty -1 ) we have 



* F K 0 .A-(n)7r( n ) = 



C>At'x(n + l) 

K lA (n) 



7r(n) + 



Ia/'xA/' 

-F(n) 



E F 1 (n)e F (??) (152) 



By analogy with the definition of gain filter (120) we can define 

fi'(n) = ^fK 0 ,n(«)’l( 77 ) 



(153) 



or 



«£s» = Kojv(n)i(n) 



(154) 
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as the [(A' + \)(M + 1) — l] + (M + l)-order LS predictor of 7r(n) using the permuted 
data matrix Y 0 ,a t (u)'I , f- Equation (152) can be rewritten using (140) as 



£'(") = 



o Afl 

~ 1) 



+ 



E F 1 (n)e F (n) 



(155) 



I A/'xAf 

-F(n) 

An alternative way to find K 0 ,Ar(n)ir(n) is to use the second part of (150) 
in (129) with V = X B (n) and U = Y 0i A f -i(u) to obtain 

K 0 ,iv-i(n) 

OM'x(n+l) 

— K 0 ,A r -i ( n )X F (n) 

Ia/'xA/' 

Now similarly define 



K 0 ,A'( ?? )iL(u) = 



+ 



ir(n) 



(156) 



V B 1 (n)e B (n) 



g"(u) = 'I'bK 0 ,a’(u)il(u) = 



g(”) 




’ -B(n) ‘ 


+ 


Om> 




_ Ia/'xA/' 



(n)e B (??) (157) 



as the [(A’ + 1 )(M 4- 1) — 1] + (Al + l)-order LS predictor of 7r(n) using the permuted 
data matrix Y^A’-i^O'Lg- Since 



vfa'W = Kf-atn) 



T'/ 



(158) 



it follows that 



g"(n) = 'Jsl’k'fn) 



(159) 



It is useful to partition g "(??) as 

g» = 



M(n) 
m (n) 



(160) 



where M(?j) is a (A ? + 1 )(-A/ + 1) — 1 vector and m(n) is a (M + 1) vector. The lower 
partition of (157) is then 



m(n) = (n)e B (n) 



(161) 
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This can be substituted in the upper partition of (157) to give 

g(n) = M(n) + B(n)m(n) (162) 

Thus it is seen that if g'(n) is available, we can obtain g(n), provided that we have 
F(n), B(??), and all the associated parameters already available. 

The inversions of E f(n) and E F (n) can also be carried out recursively using 
the matrix inversion lemma (details are given in subsection 7 of this chapter). 

4. Forward Filter Update 

To update the forward filter, proceed as follows. From (89) we have 



F(n) = K 1JV (n)X F (n) 



(163) 



To find F(7?) in terms of F(?? — 1) we use (134) post-multiplied by X F (? 7 ) with V = 
?r(n) and U(n) = Y liF (?7) to obtain 



1 

> 

1 

lo 

1 




'x F (77-i)‘ 


. -y[ A .Ki,A-(7? - i) l 




i 

X 

fO 

| 



^ T (») P i L .A'(») X / r (») 

£ T H P Mv( n )£( n ) 



1 

> 

i 


X F ( 77 ) 


K,. «(..)!(») 




0 




-1 



(164) 



Then using only the upper partition we find 

F(n-l) = F(n)-g(n-l) ’ 

7 (n - 1) 



(165) 



or 

F(,.) = F(n-l) + g(n-l) (166) 

7 (n-l) 

To compute e/r(77) before having F(?7 — 1) we note from (81) and (82) that 



e F (?r) = x F (i?) - F T (n)y l N (n) 



(167) 
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and substitute (166) to obtain 



£(”) = e ?(”l” - 1) “ “ 1) 



e?(n) 

7 (n - 1) 



(168) 



where we define 



e F {n\n - 1) = x F (n) - F T (n - l)y iN (n) 



(169) 



Now (168) can be simplified to 



ejr(n) = ejr(jj |n - 1) + (7 (n - 1) - 1) - = e T F (n\n - 1)7(77 - 1) (170) 



7 (n - 1) 



Hence from (166) we have 



F(n) = F(n - 1) + g (n — l)e^(7j|n - 1) 



(171) 



This is the desired update formula for F(n). 

To compute L F (fi) we use (71) with U = Y lt ^(n) , V = £( 77 ) and z = 
y = Xf(t 7) and. also (75) to obtain 

e F (r?)e£(7j) 



XF(n) y Pt A -»X F (77) = S F (n)- 



7 (n - 1) 



(172) 



Then using (170) we obtain 



Xf(n) 



Pf>-1) 0 



o r 0 

Finally, simplifying (173) we find 



X F (n) = S F (n) - e F (n)ep(?j|n - 1) (173) 



: F (n - 1) = Ti F (n) — e F (n)e^-(n\n — 1) 



(174) 



or 



~F(n) - — 1) + ef-(n)ep(n|n - 1) 



(175) 



Thus all the parameters for the forward multichannel prediction problem can be 
updated from the values obtained at the end of the (n — 1) recursion. 
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5. Backward Filter Update 

The update procedure for the backward filter is similar to that for the 
forward filter. From (108) we have 



B(n) = K 0 ,N-i(n)X s (n) 



(176) 



To find B(n) in terms of B(n — 1) we use (134) with V = 7r(n) and U = Yo,.v-i(n) 
and postmultiply by X B (n) to obtain 



Ko.A’-i(n-l) Q 

-yJ.v_i K o.A-i(n - 1) 1 _ 

K 0 ,a’-i (n) 

0 



Xfl(n)- 



X s (n-1) 

x£(n) 

K 0 ,N-i(n) 2 l(n) 
-1 



(177) 



£ ( n ) P o,A T -i(») X B( n ) 

^ r ( n ) P o,A'-i( n M n ) 



Then using only the upper partition of this equation we obtain 

e£(n) 



B(n - 1) = B(n) - g(n) 



7(n) 



(178) 



or 



B(n) = B (?j - 1) + g(n) 



7(n) 



(179) 



To compute e B (n) before having B(n — 1) we substitute (179) in 



e B (n) = x B (n) - B T (n)^ A) _ ] (n) 



( 180 ) 



to find 



e B (n) 



e *(") = e B (n|n - 1) - — 



(181) 



where we define 



e B (n|n - 1) = x B (n) - B r (u - l)^ A ,_j(n) 



(182) 
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Now (181) can be simplified to 

efl(n) = e fiM 77 - 1) + (~t{ n ) ~ 1) ^TT- = eB(n|n - l)l(n) (183) 

7(n) 

Therefore 



B(n) = B(n — 1) + g(n)e B (n|n - 1) (184) 

We note here that the update of g(n) depends on B(n) and vice-versa, but 
we will address that problem shortly. 

To compute Eb(t?) we use (71) with U = Y 0 ,/v-i(n) , V = 7L(n) and 
z = y = X B (n), and also (75) to obtain 

X5(ii]P; i# ., ! (n)X»(..) = SbH - eB( " ( l ^ ( ’' ) (1S5) 



Then using ( 183) 



Xj(n) 



P o.A’-i( 7} - !) Q 



0 T 0 

and simplifying we find 

T B (n - 1) = Sfi(n) - e e (n)e£H?i - 1) 



X s (7j) = Eb(?j) - e B (7?)eg(7?|n - 1) (186) 



(187) 



or 



Efi(n) = Sb(» - 1) + eB(n)eg(n|n - 1) 



(188) 



We are now ready to solve the problem of mutual dependence between g(n) 
and B(n). For this refer to (161) and (162). Substituting (184) in (162) we obtain 

g(rr) = M(n) + B(t? — l)m(7r) + ^(n)e^(n\n — l)m(n) ( 1 89) 

Now note that since 



e^H 77 - l)m(n) = e T B (n\n - l)E B 1 (n)e B (n) 



is a scalar, (1S9) becomes 

g(?j) = [M(t?) + B ( 7? - l)m(n)] (1 - e B (n|7? - l)m(n)) -1 
and the problem is solved. 



(190) 



(191) 
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6. Angle Parameter Update 



The final relation need to complete the recursion is an update formula for 
7 ( 7 ?). This update is performed in a manner similar to the update of g(n); that is we 
compute the angle parameter related to the data matrix Y 0 ,A'(n) using two different 
approaches and then equate them. By the same procedure that lead to (118) we 
define 



'/(") = 1 - yJ B (n)K 0 ,B(n)f(ii) 



(192) 



Then using (154) for K 0 „v(n) 7 r(n) and partitioning N (n) into 



Zo,A'( n ) = [xJ(n),yJ' A ,(n) 



*7 



we obtain 



j'( n ) = 1 - [xHn),yf iA ,(r?)j^F^Fg'(n) 

I 

and using (152) and simplifying we find 



V(") = *>(" - 1) “ eHn)S F 1 (7?)e F (n) 

If we now use (157) for Ko,A r (n)z(n) and partition f (n) as 



y 0 ,A’( 7? )= 






B 



we obtain 



7 # (”) = 1 “ [yJ,A’-i( n )’ x B( n )] g"(n) 

~T~ 

This can be simplified to 

7 '(n) = 7 ( 7 ?) - e F (n)Sg 1 ( 77 )e B (n) 
but using (183) and (161) we obtain 



(193) 



(194) 



(195) 



(196) 



(197) 



(198) 



7 (n) = 7'(??)[1 - el(n\n - l)m(n)] 



-1 



(199) 
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where everything in the right side of (199) is available. 

Since n ) does not appear explicitly in any of the updates we need only 
to update E^n). This can be easily done using the matrix inversion lemma. The 
specific procedure is outlined below. 

7. Inverse Matrix Update 

Although all the quantities have now been derived that permit the recursion 
to continue, we note that the inverse matrix E F \ and not the matrix itself occurs in 
the recursions. This is a fairly small matrix and could be inverted directly. However it 
is more efficient to also compute the inverses recursively. This is easily accomplished 
using the matrix inversion lemma [Ref. 5, 7]. We have from (175) 



E F (n) = E F (n - 1) + e F (n)e F (n|?? - 1) 



or using (183) we have 



E F (?>) = S F (n - 1) + 



e F (n)e F (n) 

7 (n - 1) 



The matrix inversion lemma states that if 



A = E + FG _1 F t 



then 



A -1 = E" 1 - E -1 F[G + F t E- 1 F]- , F t E" 1 



For 



A = S F (n) 

E = E F (n — 1) 
F = e F (n) 

G = 7(n — 1) 



( 200 ) 



( 201 ) 



( 202 ) 



(203) 



(204) 

(205) 

(206) 
(207) 
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we obtain 



£?’(») = ?f (>'-!)- 



E f ’(n - 1 ie f (n)ejrlnj£ f '(n - 1) 



(20S) 



ifo-O+efWEr'fri-OeHn) 

The computation of E F ] ( n ) using the matrix lemma amounts to 1.5(A/ + l) 2 + 2.5(A/ + 
1) multiplications and one division. 

Although the matrix Eg’ (n) could be computed in a similar way, the in- 
verse of Eb(u) is not needed for the 2-D filter. 

H. ALGORITHM SUMMARY 

1. Computational Complexity 

The computat ional cost of the algorithm depends on the shape and size of 
the filter mask. For each mask we must define, as mentioned before, both a forward 
and a backward multichannel signal a number of channels equal to the number of new 
points acquired or dropped off by the mask. Call this number A] . For the quarter 
plane filter we used A j = M + 1. Now further define A' 2 as the number of coefficients 
in the 2-D filter mask (for the case used in the derivation 1\ 2 = {M 4- 1)(A’ -f 1) — 1). 
The computational cost of the algorithm depends only on these two numbers. Note 
that the use of the permutation matrices only changes the ordering of the elements in 
the matrices affected by them, allowing the procedure to be used for any shape and 
size of filter. The permutation can be obtained without any multiplications, hence 
it will not be considered in this analysis. We also mentioned before the use of a 
forgetting factor A in the cost function to handle nonstationary signals. The effect of 
this constant in the final algorithm shows up only in the computation of the inverse 
error covariance matrix for the forward multichannel filter: 

(n ~ l)e f (n)e^(n)Sf 1 (n - 1) \ 



Sf 1 (») = A - 1 S^n-l)- 



(209) 



h (n - 1) + e^(n)S F , (n - l)e F (n)/ 

This increases the computational cost by K1 multiplications. A detailed count of the 
number of operations required by the algorithm is given in subsection 3 below. The 
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method used for the 1-D RLS can be applied to 2-D signals by forming the I \ 2 x A' 2 
2-D deterministic correlation matrix and the A' 2 x 1 vector of deterministic cross- 
correlation terms between the desired filter response and the filter inputs. This form 
of 2-D RLS algorithm requires 1.5/\| d-4.5A' 2 operations per iteration which increases 
quadrat ically with A' 2 . 

2. Initial Conditions 

The recursive implementation of the algorithm requires some initialization 
for the variables used. The assumption that the signal is zero before iteration n = 0 
is reasonable and suggests that all the filter parameters including those of the gain 
filter, should all be set to zero. This choice of initial conditions implies that the 
angle parameter ^(O) must be set to 1.0 since all of the subspaces associated with 
previous data are the null space. However, a positive forward prediction error energy 
is necessary for the algorithm to start. The initial conditions used are therefore 



a(0) = 0 


(210) 


5 

0 

II 

O 


(211) 


B(0) = O 


(212) 


g(°) = 0 


(213) 


1(0) = 1 


(214) 


V-l _ 

—‘F — ^-Lv/'x A/' 


(215) 


8 = small positive constant 


(216) 



3. Iteration at time n and Required Arithmetic Operations 

The terms to be computed at each iteration, and their formulas are given 

below. 

- A priori 2-D prediction error (A' 2 operations) 

ei{n\n - 1) = Pi(n) - yj^ r (n)a(n - 1) (217) 
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- 2-D filter update (h '2 operations) 



a(n) = a (?i — 1) + g (n — l)ei(n|n — 1) (218) 

- A priori multichannel forward prediction error (7\ 1 K? operations) 

e F (n|r? - 1) = x F (n) - F T (n - l)y l7V (n) (219) 

- Multichannel forward prediction error (I\\ operations) 



e F (n) = e F (n|n - l) 7 (n - 1) 



( 220 ) 



- Inverse error covariance matrix for the multichannel forward filter ( 1 .5A' 2 + 
.5 A', operations) 

v , F J (n - l)e F (n)e F (n)E F 1 (n - 1) 



S F 1 ( n ) — — 1) — I i'll T/ 1 v~l ( 1 \ / \ 

l(n - 1) + e F (n)S F (n - 1 )e F (n) 

- Multichannel forward filter update (I\\ I\ 2 operations) 

F(?j) = F(?? - 1 ) + g(n - l)e F (n|n - 1) 

- Extended gain transversal filter (A 2 + K\ I \2 operations) 



S"(") = 



M(n) 


J 


Om' 




Ia/'xM' 


\ 

S F 1 e F (n) 


= Vb^f 


+ 


m(?j) 


\ 


. s(n - 1 ) . 




. - F M . 


1 



( 221 ) 



( 222 ) 



(223) 



- Extended angle parameter (I(\ operations using previous results) 

V(”) = 0(» - 1) - e F (n)E F (n)e F (n) (224) 



- A priori multichannel backward prediction error ( I\\I \2 operations) 

e B (n|?? - 1) = xl(n) - B T (n - l)yJ A r_,(n) (225) 



- Angle parameter (I\\ + 1 operations) 



7 ( 7J ) = V(n)[l - e F (n|n - l)m(n)] 1 



(226) 
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Gain transversal filter (A'jA' 2 + A' 2 operations using previous results) 



g(n) = [M(n) + B(n - l)m(n)] (1 - eg(n|n - l)m(n)) 1 
- Multichannel backward filter update ( A'i A’ 2 operations) 

B(n) = B(n - 1) + g(n)ej(n|n - 1) 

The total number of operations (multiplications or divisions) required per 
by the algorithm is 

2.5A 1 T 6 A ] A 2 T 4 . 5 A j T 3 A 2 T 1 



(227) 



(228) 

iteration 



(229) 
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IV. EXTENSIONS, APPLICATIONS AND RESULTS 



The 2-D FRLS algorithm was developed in the previous chapter for a 2-D pre- 
diction filter whose observed signal was the same as the sequence under the filter 
mask. In this particular case the coefficients of a(n) for the 2-D filter, are identical to 
the coefficients of the first column fi(n) of the multichannel filter F(n). To see this, 
note that by definition, the error energy for the multichannel forward prediction filter 
(87) can be rewritten as the summation of the error covariance associated with each 
of the (M + l) channels of the forward prediction filter, where the first term is ei(n), 
i.e., the sum of squared errors for the first channel (44). We can rewrite (87) as 

/r[I F (n)] = tr[E T F (n)E F (n)} = c a (n) + T F (n) (230) 

where T F (n) is independent of the coefficients in fj(n). The 2-D filter coefficients and 
the coefficients in the first column of the forward multichannel filter are the result of 
minimizing the same cost function and are thus identical. 

We will now consider the case when the data sequence under the prediction 
mask is distinct from the observed sequence (general FIR Wiener filter) and also the 
case when the filter mask covers not only observation data, but also other input data 
sequence (ARMA model). Following that we will present the results of computer 
simulations to illustrate the applications of the adaptive algorithms for 2-D signal 
processing. 

A. GENERAL FIR WIENER FILTER 

The extension of the 2-D FRLS to the general FIR Wiener filtering problem is 
straightforward to obtain by following the same concepts presented in chapter 3. To 
be specific we again consider a first quadrant (N + 1) x ( M -|- 1) filter here. However, 
the procedure applies more generally to nonsymmetric half plane and other filters as 
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discussed in section H of chapter 3. The multichannel notation developed in chapter 
3 is used here to define two I\ x L 2-D sequences d^(n) and J/i(n), where d\(n) is the 
sequence we want to estimate based upon the input sequence 2/1(7?). Our goal is to 
find a prediction filter of the form 

=xbv( n ) a ( n ) (231) 

with y N defined as in (39) and a (n) defined as in (41) that minimizes the sum of 
squared errors 

£.(») = Eh(01 2 (232) 

1=0 

where the prediction error ei(r?) is given by 

ei(n) = di(n) — di(n) (233) 

This can be written in vector notation as 

«i(n) = e[(n)ej(n) (234) 

where 

e,(n) = d,(n) — d^n) (235) 

with d,(7?) defined as (n + l)-dimensional vector that contains the observation data 
from the origin up to point ??. Then the estimate dj(7j) is given by 

d^n) = Y ]i A/(n) a(n) (236) 

where Yi i x(t?) is the same data matrix as in (47) and (48). The least squares solution 
for a(?i) is once more given by the pseudo-inverse 

a(n) = (Yf, K (n)Y,,«(n))~ , Yf iN ( n )d l ( n ) (237) 

and the estimate of dj(7?) and the prediction error can also be expressed in terms of 
the projection matrices defined in chapter 3. The estimate d^T?) is the projection of 



48 



d,(n) onto the subspace spanned by the input data 

d](n) = P liA r(n)di(r?) (238) 

The error ej(n) is orthogonal to the estimate d^n) and is given by 

£i(n) = Pttv( n )di(n) (239) 

The operator Kj , a'(«) defined in (55) can be used to rewrite a(n) as 

a(n) = K 1 , 7 v(n)^ 1 (n) (240) 

We already know how to obtain Ki iA ’(n) from Ki^( n — 1) in a efficient way, 
thus we are able update a (n — 1) to a(n) as soon as d](n) is available. The complete 
algorithm is the same as the one summarized in section H of chapter 3 with t/i(n) 
replaced by d^(n) in (217). 

B. ARMA MODEL 

The ARMA version of the 2-D FRLS can be viewed as follows. Let us call 
the output or observed data y\(n) and the input data ici(n). For the present let 
us assume that this latter sequence is also known or observed. Let us separate the 
coefficients that operate on the two different sequences and call a(n) the vector of 
AR coefficients of the filter, and b(n) the vector of MA coefficients of the filter. As 
before, we develop this extension of the 2-D FRLS to ARMA models assuming a first 
quadrant (A T + 1) x (M + 1) quarter plane mask for both the AR and MA components 
of the filter, noting that more general forms are possible. Using the scanning index n 
defined before, we proceed by defining an ARMA prediction filter of the form 

h(") = y[ iN (n)a(n) + wf >A ,(n)b(n) (241) 

with y v (n) and w l A -(?!) defined using the same concepts as in (39). We want to find 
a(n) and b(n), the filter coefficients defined respectively for each mask, to minimize 
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the sum of squared errors 



<i(”) = UMO] 2 

1=0 

where the prediction error is given by 

c i(”) = 2/i( n ) - yi( n ) 

This can be written in vector notation as 

ti(n) = £[(n)e,(n) 

with 



(242) 



(243) 



(244) 



£i(»0 = ^(w) — XiC”) ( 245 ) 

We can combine the AR and MA coefficients in one single vector c(n) as 

c(n) = [a T (n),b T (n)] T (246) 

The data under the mask can then be expressed as 

Zi..v(”) = [xf tA r(n),ffi[ tA .(n)] r (247) 

Now we have for the estimate of y (n) 

y,(”) = Zi , N (n)c(n) (248) 

where Z 1 iA t(? 2 ) is the data matrix 

Z 1 . A -(n) = [Y 1 , A -(n),W 1 , A -(n)] (249) 



formed by Yi^(n) and W i,/v(n), the data matrices associated respectively with the 
AR and the MA masks with the same structure as (47) and (48). The least squares 
solution for c (n) is given by the pseudo-inverse of Z i,n(^) 

c(??) = (z[ iA r(n)Z 1)A '(n)) ’ Zf JV (n)y l (n) (250) 

After defining new projection matrices and transversal filter operators associated with 
the new data matrices, the algorithm to recursively update c (n) closely follows the 
procedures developed in chapter 3. 
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C. ARMA MODELING WITH UNKNOWN INPUT 



In the previous section on 2-D ARMA modeling, it was assumed that the se- 
quence uq (n) was known and available. This is an ideal situation which could some- 
times exist for example in a system identification problem (see Figure 7). Given both 




Figure 7. Modeling with Known Input 



the input sequence iU](n), and the output sequence yi(n) and an assumed linearity 
and order for the model, we have all the information necessary to identify the unknown 
system under analysis. Knowledge of the input sequence is not always available, how- 
ever this is the case, for example, in the problem of estimating the parameters of an 
ARMA model where uq(n) is a 2-D white noise sequence. The ARMA parameter 
estimation problem for unknown input was addressed using recursive algorithms for 

1- D signals by embedding the AR and MA estimation in a 2-Channel AR modeling 
problem [Ref. 2]. The difficulties with the procedures suggested are similar here; the 

2- D white noise process w } (n) is not known and needs to be estimated from the data. 
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Assume that at some moment n all the data under the MA mask is known 
except the most recent sample of the noise sequence uq(n). Further assume that the 
ARM A filter coefficients estimated so far are fairly close to the actual coefficients 
that characterize the system. The natural choice for an estimate for the unknown 
noise sample is the error ej(n), i.e., we expect the error to be zero if the noise sample 
was known. This procedure is known as ” bootstrapping” [Ref. 2] (see Figure 8). It 




involves two steps. First an estimate j/i(n) is obtained assuming uq(n ) = 0 and using 
the old parameter estimates. Secondly Wi(n) is set equal to ei(n) and we proceed as 
in the case of a known input sequence. This method is highly nonlinear and hence 
very difficult to analyze. However it was found to give reasonable results in practice. 

D. SIMULATION RESULTS 

The 2-D FRLS algorithm was tested both on computer-generated data and on 
digitized images. For a baseline reference the 2-D LMS algorithm was also imple- 
mented. The synthetic data for the system identification and parameter estimation 
results was obtained by driving different 2-D transfer functions with computer gen- 
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erated white Gaussian noise. Most of the tests were performed on 32 x 32 point 
2-D data sequences. Image coding was performed for 256 x 256 pixel images using 
the VICOM system for display purposes and a VAX-785 computer for the algorithm 
implementation. The algorithms were coded in Fortran. 

1. 2-D System Identification 

The first computer simulation in system identification was performed using 
a (2x2) MA model. A computer-generated white Gaussian noise sequence was applied 
both to a filter with known coefficients and to the adaptive filter in the manner of 
Figure 7. The error between the output of the two filters was used to adjust the 
coefficients of the adaptive filter. The MA filter had the form 

d(??,,7? 2 ) = fc(0.0)y(ni,n 2 ) + 6(0. l)y(n u n 2 - 1) (251) 

+ 6(l,0)s/(«i - l,n 2 ) + 6( 1. l)y(nj - l,n 2 - 1) 

with 



6(0.0) 


= 1.0 


(252) 


6(0,1) 


= 0.6 


(253) 


6(1,0) 


= -0.3 


(254) 


6(1,1) 


= 0.3 


(255) 



The rate of convergence is shown in Figure 9. As can be seen, each of the coefficients 
converged very rapidly to the actual value. The 2-D LMS algorithm was also imple- 
mented for this case, but as can be seen in Figure 10 the convergence rate is very 
slow. 

The next simulation was performed using an ARMA model where both 
the AR and the MA masks were first order nonsymmetrical half plane filters. A 
computer-generated white Gaussian noise sequence was applied both to a filter with 
known coefficients and to the adaptive filter. The error between the output of the 
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Figure 9 . System Identification MA Model ' 

two filters was used to adjust the coefficients of the adaptive filter. The ARM A filter 
had the form 



2 /(ni,n 2 ) = a(l,0)?/(ni - l,n 2 ) + a(-l,l)y(m + l,n 2 - 1) (256) 

+ <*( 0 , l)j/(»i,n 2 - 1 ) + a(l,l)y(ni - l,n 2 - 1 ) 

+ 6 ( 1 , 0 ) 10 ( 7 ?! - l,n 2 ) + 6(-l, l)u?(r?i + l,r? 2 - 1) 

+ 6(0, 1 ) 10 ( 7 ?!, n 2 — 1) + 6(1, l)to(r?i — l,r? 2 — 1) 



= 0.8 



= - 0.1 



with 



a(l,0) 

«(-U) 

o(0, 1) 
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0.4 



(257) 

(258) 

(259) 
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g 




Figure 10. System Identification MA Model (2-D LMS) 



a(l, 1) 


= -0.5 


(260) 


6(1,0) 


= -0.2 


(261) 


6(— 1, 1) 


= 0.1 


(262) 


6(0,1) 


= 0.8 


(263) 


6(1,1) 


= -0.5 


(264) 



The rate of convergence is shown in Figure 11 and Figure 12 for both the AR and the 
MA coefficients. As can be seen there, each of the coefficients converged in about 80 
iterations to the true value. 

To test the behavior of the algorithm with non-stationary data the same 
model was run with data obtained by changing some of the coefficients at iteration 
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Figure 11. System Identification ARMA Model (AR coefT. stationary 
data) 

120 to 



«(1,0) 


= 0.1 


(265) 


a(-l,l) 


= 0.6 


(266) 


6(1,0) 


= 0.4 


(267) 


6(0,1) 


= 0.2 


(268) 



As shown in Figure 13 and Figure 14, the AR and the MA coefficients converged to 
the true initial values, but at iteration 120 when some of the coefficients were changed 
the algorithm started slowly tracking the new coefficients. Since no forgetting factor 
was used here, the algorithm does not forget the initial data and the convergence is 
very slow. The estimated coefficients remain biased. By using a forgetting factor 
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Figure 12. System Identification ARMA Model (MA coefT. Stationary 
data) 

of A = 0.95 wc obtained the results shown in Figure 15 and Figure 16. With the 
introduction of this forgetting factor the filter coefficients were able to lock on the 
new coefficients in about 150 more iterations. 

2. 2-D Parameter Estimation 

The first computer simulation in parameter estimation was performed us- 
ing a first order nonsymmctric half plane AR model. A computer- generated white 
Gaussian noise sequence was applied to a 2-D AR filter with known coefficients to 
obtain the data. T he adaptive filter has access only to the AR (output) sequence as 
shown in Figure 8. The error between the output of the two filters was used to adjust 
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Figure 13. System Identification ARMA Model (AR coefT. nonstationary 
data) A = 1.0 



the coefficients of the adaptive filter. The AR filter had the form 



= a(l,0 )y(n, 


- l,n 2 ) + 


a(-i,i)y(n 1 + F n 2 - 1 ) 


(269) 


+ o(0, l)y(ni. 


,»2 - 1) + 


a(l, \)y(n\ - 1 ,n 2 - 1) + w{n u n 2 ) 






a(l,0) 


= 0.8 


(270) 




a(-l, 1) 


= -0.1 


(271) 




a(0,l) 


= 0.4 


(272) 




a(l,l) 


= -0.5 


(273) 
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Figure 14. System Identification ARMA Model (MA coefT. nonstationary 
data) A = 1 .0 

The rate of convergence is shown in Figure 17 and as can be seen each of the coeffi- 
cients converged to values close to the true values. Since the algorithm does not know 
the input sequence and has to estimate it, there is a slight variation of the estimated 
coefficient around the true values. Here again the 2-D LMS algorithm was imple- 
mented, but as can be seen from Figure 18 some of the coefficients did not converge 
even after 900 iterations. 

Next the modeling procedure with unknown input w r as tested for the ARMA 
case. A first order nonsynnnetric half plane mask was used for both the AR and MA 
coefficients. A computer-generated white Gaussian noise sequence was applied to a 
filter with known coefficients to obtain the ARMA data. The adaptive filter does not 
have access to the driving sequence. The error between the output of the two filters 
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Figure 15. System Identification ARMA Model (AR coefF. nonstationary 
data) A = 0.95 



was used to adjust the coefficients of the adaptive filter. Ihe ARMA filter had the 
form 



y(«i>»2) 



= a(l,0)t/(n, - l,n 2 ) + a(-l,l)y( n i + 1,«2 - 1) (274) 

+ a(0, \)y{n u n 2 - 1) + a(l, l)l/( n i - L n 2 - 1) 

-f U’(nj, ?i 2 ) + 6(1, 0)ie(nj — 1, n 2 ) + K — L l) u, ( n i + L n 2 — 1) 

4 6(0, l)t/-’(»i)»2 — 1) 4- 6(1, l)u7(ni — l,n 2 — 1) 



with 



a(l, 0) = 0.8 



«(-!,!) = 
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- 0.1 



(275) 

(276) 
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Figure 16. System Identification AR.MA Model (MA coefF. nonstationary 
data) A = 0.95 



«(0,1) 


= 0.4 


(277) 


0(1,1) 


= -0.5 


(278) 


6(1,0) 


= -0.2 


(279) 


K-U) 


= 0.1 


(280) 


6(0,1) 


= 0.8 


(281) 


6(1,1) 


= -0.5 


(282) 



The rate of convergence is shown in Figure 19 and Figure 20 and as can be seen each 
of the Alt coefficients converged to values close to the true values. This is similar to 
what happened for the AR parameter estimation problem. The MA coefficients also 
converged to values acceptably close to the actual MA values, but they show some 
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Figure 17. Parameter Estimation AR Model 

I 

bias. As in the previous case the algorithm does not have knowledge of the input 
sequence, hence estimates it using the bootstrapping scheme. Although this method is 
difficult to analyze due to its non-linear nature, the results obtained are encouraging. 

3. Image Coding 

An image coding problem was also used to test the adaptive algorithm. 
Two black and white images, Figure 21 and Figure 22, with 256 x 256 pixels were the 
2-D sequences used to perform AR and ARMA parameter estimation as described 
above. The 2-D LMS algorithm was also applied to the images to estimate the AR 
parameters. These parameters were then used to form the linear predictor used in 
the coding and decoding scheme shown in Figure 23. A two level quantizer with the 
step size value taken from the Max table was used, assuming that the error sequence 
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Figure 18. Parayneter Estimation AR Model (2-D LMS) 



obtained had a Gaussian distribution [Ref. 12]. This resulted in a quantized error 
sequence corresponding to one bit per pixel. The image reconstruction was performed 
by driving the inverse filter with the quantized error sequence. 

The first image was reconstructed after being encoded using the three 
different linear predictors and the results are shown in Figures 24, 25 and 26. The 
error images between the original image and the reconstructed images are shown in 
Figures 27, 28 and 29. One of the most widely used measures for the performance 
of a predictive coder is the signal-to-noise ratio (SNR). For a I\ x L 2-D sequence 
y(n],n 2 ), it can be defined as follows: 



SNR(dB) = 10 log 






(283) 
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Figure 19. Parameter Estimation ARMA Model (AR coefF.) 

The subjective quality of the reconstructed images agrees with the signal- 
to-noise ratio (SNR) obtained for each case. 



2-D LMS (AR) 


SNR = 15.96 dD 


(284) 


2-D FRLS (AR) 


SNR = 17.24 dB 


(285) 


2-D FRLS (ARMA) 


SNR = 18.29 dB 


(286) 



The better performance of the 2-D FRLS when compared with the 2-D LMS is ap- 
parent in the results. The improvements obtained for this image with the ARMA 
model imply that the model was able to fit the image better than the AR model and, 
thus produce an error sequence that was more nearly white. 

The algorithm was also tested on the second image (Figure 22). The results 
are shown in Figures 30, 31 and 32. The error images between the original image 
and the reconstructed images are shown in Figures 33, 34 and 35. The subjective 
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Figure 20. Parameter Estimation ARMA Model (MA coeff.) 



quality of the reconstructed images once again agrees with the signal-to-noise ratio 



(SNR) obtained for each case, but the quality difference between the reconstructed 



images obtained using different methods is smaller. 



2-D LMS (AR) 


SNR = 17.25 dB 


(287) 


2-D FRLS (AR) 


SNR = 18.16 dB 


(288) 


2-D FRLS (ARMA) 


SNR = 18.62 dB 


(289) 



In particular, the improvements obtained with the ARMA model for this case are 
not as large. This is probably because the second image has a large number of sharp 
edges, and these are quite difficult to model with any finite order linear model. 




Figure 21. linage 1 Original 




Figure 22. Image 2 Original 
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Figure 23. Predictive coding, (taken from [Ref. G]) 




Figure 24 . Image 1 Reconstructed 2-D LMS (Alt) 



G7 




Figure 25. Image 1 Reconstructed 2-D FRLS (AR) 




Figure 26. Image 1 Reconstructed 2-D FRLS (ARMA) 
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Figure 27. Image 1 Error 2-D LMS (AR) 




Figure 28. Image 1 Error 2-D FRLS (AR) 
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Figure 29. Image 1 Error 2-D FllLS (ARMA) 




Figure 30. Image 2 Reconstructed 2-D LMS (AR) 
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Figure 31. Image 2 Reconstructed 2-D FRLS (AR) 




Figure 32. Image 2 Reconstructed 2-D FRLS (ARMA) 




Figure 33. Image 2 Error 2-D LMS (AR) 




Figure 34. Image 2 Error 2-D FRLS (AR) 
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Figure 35. Image 2 Error 2-D FIILS (ARMA) 
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V. CONCLUSIONS AND RECOMMENDATIONS 



A Two-Dimensional Fast Recursive Least Squares (2-D FRLS) algorithm was 
developed using a geometric formulation. The derivation is based on the relation 
between least squares prediction and the concepts of orthogonality associated with 
vector spaces. The ordering necessary to develop the recursive algorithm was imposed 
on the data by using a linear scanning index. 

A substantial reduction in computational cost is obtained when compared with 
the basic 2-D RLS algorithm. The 2-D FRLS algorithm requires on the order of 
6 Ah I<2 arithmetic operations per iteration compared with 1.5A'f for the basic RLS, 
where I\\ is the number of channels defined for the 2-D FRLS algorithm and Ah is 
the total number of coefficients in the 2-D filter. The 2-D LMS algorithm, due to 
its simplicity, is still more economical than our algorithm in terms of computational 
cost, but lacks the excellent convergence performance experienced for the 2-D FRLS. 

The work described here could be extended in several different ways. First a 
thorough investigation of the behavior of the algorithm when using finite word length 
implementation as well as different forgetting factors could be developed. Secondly, 
techniques used for the 1-D fast RLS to obtain further reductions of the computa- 
tional cost could be investigated. In particular a variant called the gain normalized 
Fast Transversal Filter [Ref. 6] seems to be applicable to the 2-D FRLS. Its derivation 
however does not follow directly from the geometrical approach presented here. Fi- 
nally, the algorithm could be tested in other areas including 2-D parametric spectral 
estimation. 
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APPENDIX PROJECTION MATRIX UPDATE 



This appendix derives the results 



Pu £ (”) = 



Pu(n — 1) Q 



(290) 



and 



K U£ (n) = 



Ku(n - 1) Q 
-u T Ku(n — 1 ) 1 



(291) 



used in Chapter 3. 

Begin by noting that the data matrix [U(n),7r(n)] can be partitioned as 







0 








0 




[U(n),i(n)] = 


U(n) 


0 








1 





U(n -1)0 
u r 1 



(292) 



where u is the last row of U(n). Pi/ £ (rc) is defined by 

Pu»(n) = [U(n),£(n(] ([U(n).£(n)] r (U( n ), £(«)])'' [U(n),x.(n)J r (293) 

Then using (292) we have 



([U(n),7r(n)] r [U(77).7r(?T)]) J = 



U T (n - l)U(n - 1) + uu 7 u 
• T 1 



u J 



-l 



(294) 



Here we will use the relation for the inverse of a symmetrical matrix by partitioning 
[Ref. 13]. If: 

' A B 
C D 



M = 



(295) 



75 



with C = B 7 and A and D square matrices, then 



with 



M -1 



A -1 + PQ -1 P r -PQ- 1 
-Q^P 7 Q" 1 



P = A -1 B 



and 



Q = D - CP 



For our case we have 



A = U T (n-l)U(n-l) + U u T 
B = u 
C = u T 
D = 1 



(296) 



(297) 

(298) 

(299) 

(300) 



The matrix inversion lemma [Ref. 5, 7] is used first to obtain A 1 as follows. 
The matrix inversion lemma states that if 



A = E -f FG -1 F r 



(301) 



then 



A' 1 



E - E F 



G + F r E -1 F 



-1 



T XT' — 1 



F E 



(302) 



Now taking 



E 



U T (n - l)U(n - 1)] 



F = u 



(303) 

(304) 



G = 1 



(305) 
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we have 



-l 



= E -1 - E -1 u 



1 + u t E -1 u 



= E - 



K=scalar 

E" 1 uu T E-' 



T ir-1 



u E 



1 + I< 



(306) 

(307) 



where K = u T E 1 u. The upper left partition of (296) becomes 
A _1 + PQ _1 P t = A -1 + A _1 u 1— u T A _1 uj 1 u t A -1 



^ E _1 uu T E -1 

E " TTk 

E" 1 uA r ’ 



(308) 

(309) 



+ 



E -1 u — 



(A' + l) 



u J E -1 - 



= E" 1 - 



l+K 

E -1 u u T E -1 E -1 u u T E -1 



A'u T E _1 
1 + 1 < 



l + K 



+ 



= E" 1 = 



1 + A' 
U T (n - l)U(n - l)]" 1 



(310) 

(311) 



To find the other partitions we write 

P = A -1 B = E -1 u — 



-lo 

l+K 



(312) 



E -1 u (U t (tj - l)U(n - 1)) 'u 



1 + A' 



and 



Q = D - CP = 1 - 



l + K 



u T E ! u 1 



1 + A' 1 + I< 



(313) 



thus we have 



-PQ- 1 = (u T (n- l)U(n- 1)) 1 u 



Now substituting (311), (313) and (314) in (296) we obtain 



(314) 



M" 1 = 



(U T (n — l)U(n — 1)) - (U T (n - l)U(n - 1)) u 

-u T (U T (?? - l)U(n - l))’ 1 I< + 1 



(315) 
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Postmultiplving this result by the transpose of (292) we obtain 



Ku» = 



Ku(n-l) 0 
-U T Ku(n - 1) 1 

Finally, premultiplying by (292), (293) becomes 



PUjrfa) = 



Pu(n - 1) Q 

0 T 1 



Q.E.D. 



( 316 ) 



(317) 
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